本地通过虚机部署一个高可用 k8s 集群,好久没用了,开机命令无法正常执行,提示 vip 对应的 IP 访问 apiservice 对应的端口无法访问成功
1 2 3
┌──[root@vms100.liruilongs.github.io]-[~] └─$kubectl get nodes The connection to the server 192.168.26.99:30033 was refused - did you specify the right host or port?
通过 ip a 命令查看配置的 VIP 是否生效,发现没有生效。说明 当前节点配置 VIP 的 keepalived 有问题
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
┌──[root@vms100.liruilongs.github.io]-[~] └─$ip a 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: ens32: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000 link/ether 00:0c:29:0e:5d:5f brd ff:ff:ff:ff:ff:ff inet 192.168.26.100/24 brd 192.168.26.255 scope global ens32 valid_lft forever preferred_lft forever inet6 fe80::20c:29ff:fe0e:5d5f/64 scope link valid_lft forever preferred_lft forever 3: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN link/ether 02:42:68:f8:90:26 brd ff:ff:ff:ff:ff:ff inet 172.17.0.1/16 brd 172.17.255.255 scope global docker0 valid_lft forever preferred_lft forever
测试网络层,ping 测发现可以通,好奇怪,说明其他的 VIP 节点有可用的
1 2 3 4 5 6 7 8 9
┌──[root@vms100.liruilongs.github.io]-[~] └─$ping 192.168.26.99 PING 192.168.26.99 (192.168.26.99) 56(84) bytes of data. 64 bytes from 192.168.26.99: icmp_seq=1 ttl=64 time=0.784 ms 64 bytes from 192.168.26.99: icmp_seq=2 ttl=64 time=0.411 ms ^C --- 192.168.26.99 ping statistics --- 2 packets transmitted, 2 received, 0% packet loss, time 1012ms rtt min/avg/max/mdev = 0.411/0.597/0.784/0.188 ms
SSH 进去看一下
1 2 3 4 5 6 7 8
┌──[root@vms100.liruilongs.github.io]-[~] └─$ssh root@192.168.26.99 The authenticity of host '192.168.26.99 (192.168.26.99)' can't be established. ECDSA key fingerprint is SHA256:BmaDR4pX6G1WgStkR7Lcl7Yg4fhP2d8idUBxW3HEzsA. ECDSA key fingerprint is MD5:2e:49:16:97:30:90:e3:28:b2:43:2d:64:9d:f2:d4:6d. Are you sure you want to continue connecting (yes/no)? yes Warning: Permanently added '192.168.26.99' (ECDSA) to the list of known hosts. Last login: Wed Nov 15 11:12:11 2023 from 192.168.26.100
┌──[root@vms102.liruilongs.github.io]-[~] └─$kubectl get nodes Unable to connect to the server: EOF
连接异常,这里我们打印一下接口调用详细信息
1 2 3 4
┌──[root@vms102.liruilongs.github.io]-[~] └─$kubectl get nodes -vv error: invalid argument "v"for"-v, --v" flag: strconv.ParseInt: parsing "v": invalid syntax See 'kubectl get --help'for usage.
高版本的命令有变化,需要注意一下。
1 2 3 4 5 6 7 8
┌──[root@vms102.liruilongs.github.io]-[~] └─$kubectl get nodes -v=1 I0209 13:52:56.780335 72398 shortcut.go:100] Error loading discovery information: Get "https://192.168.26.99:30033/api?timeout=32s": dial tcp 192.168.26.99:30033: connect: connection refused The connection to the server 192.168.26.99:30033 was refused - did you specify the right host or port? ┌──[root@vms102.liruilongs.github.io]-[~] └─$kubectl get nodes -v=2 I0209 13:53:16.963102 72533 shortcut.go:100] Error loading discovery information: Get "https://192.168.26.99:30033/api?timeout=32s": dial tcp 192.168.26.99:30033: connect: connection refused The connection to the server 192.168.26.99:30033 was refused - did you specify the right host or port?
┌──[root@vms102.liruilongs.github.io]-[~] └─$docker ps | grep keep f2a9b9f187a6 0cde578847cc "/container/tool/run" 12 hours ago Up 12 hours k8s_keepalived_keepalived-vms102.liruilongs.github.io_kube-system_f0ae51f10833bbd4d70ccb8690f2429c_55 822eec55d6af registry.aliyuncs.com/google_containers/pause:3.8 "/pause" 12 hours ago Up 12 hours k8s_POD_keepalived-vms102.liruilongs.github.io_kube-system_f0ae51f10833bbd4d70ccb8690f2429c_55
查看 apiserver 是否正常,执行命令实际上是调用的 kube-apiserver
1 2 3
┌──[root@vms102.liruilongs.github.io]-[~] └─$docker ps |grep api 56807ccad104 registry.aliyuncs.com/google_containers/pause:3.8 "/pause" 12 hours ago Up 12 hours k8s_POD_kube-apiserver-vms102.liruilongs.github.io_kube-system_88f80934116e8f989883c8eba6636201_41
果然挂掉了,这里看下最后的日志
1 2 3 4
┌──[root@vms102.liruilongs.github.io]-[~] └─$docker ps -a | grep api c9bd413b176f b09a3dc327be "kube-apiserver --ad…" 2 minutes ago Exited (1) 2 minutes ago k8s_kube-apiserver_kube-apiserver-vms102.liruilongs.github.io_kube-system_88f80934116e8f989883c8eba6636201_225 56807ccad104 registry.aliyuncs.com/google_containers/pause:3.8 "/pause" 12 hours ago Up 12 hours k8s_POD_kube-apiserver-vms102.liruilongs.github.io_kube-system_88f80934116e8f989883c8eba6636201_41
日志显示加载准入控制器之后直接报错了,没有其他的提示。
1 2 3 4 5 6 7 8 9 10 11
┌──[root@vms102.liruilongs.github.io]-[~] └─$docker logs --tail -5 c9bd413b176f I0209 05:51:43.041043 1 server.go:563] external host was not specified, using 192.168.26.102 I0209 05:51:43.042642 1 server.go:161] Version: v1.25.1 I0209 05:51:43.042693 1 server.go:163] "Golang settings" GOGC="" GOMAXPROCS="" GOTRACEBACK="" I0209 05:51:43.362808 1 shared_informer.go:255] Waiting for caches to sync for node_authorizer I0209 05:51:43.363544 1 plugins.go:158] Loaded 12 mutating admission controller(s) successfully in the following order: NamespaceLifecycle,LimitRanger,ServiceAccount,NodeRestriction,TaintNodesByCondition,Priority,DefaultTolerationSeconds,DefaultStorageClass,StorageObjectInUseProtection,RuntimeClass,DefaultIngressClass,MutatingAdmissionWebhook. I0209 05:51:43.363560 1 plugins.go:161] Loaded 11 validating admission controller(s) successfully in the following order: LimitRanger,ServiceAccount,PodSecurity,Priority,PersistentVolumeClaimResize,RuntimeClass,CertificateApproval,CertificateSigning,CertificateSubjectRestriction,ValidatingAdmissionWebhook,ResourceQuota. I0209 05:51:43.364480 1 plugins.go:158] Loaded 12 mutating admission controller(s) successfully in the following order: NamespaceLifecycle,LimitRanger,ServiceAccount,NodeRestriction,TaintNodesByCondition,Priority,DefaultTolerationSeconds,DefaultStorageClass,StorageObjectInUseProtection,RuntimeClass,DefaultIngressClass,MutatingAdmissionWebhook. I0209 05:51:43.364499 1 plugins.go:161] Loaded 11 validating admission controller(s) successfully in the following order: LimitRanger,ServiceAccount,PodSecurity,Priority,PersistentVolumeClaimResize,RuntimeClass,CertificateApproval,CertificateSigning,CertificateSubjectRestriction,ValidatingAdmissionWebhook,ResourceQuota. E0209 05:52:03.366417 1 run.go:74] "command failed" err="context deadline exceeded"
┌──[root@vms102.liruilongs.github.io]-[~] └─$docker ps | grep etcd 43dccee957e0 a8a176a5d5d6 "etcd --advertise-cl…" About a minute ago Up About a minute k8s_etcd_etcd-vms102.liruilongs.github.io_kube-system_bb9615ff1be73c1b0c1f420f3da9806a_156 523a83b11288 registry.aliyuncs.com/google_containers/pause:3.8 "/pause" 12 hours ago Up 12 hours k8s_POD_etcd-vms102.liruilongs.github.io_kube-system_bb9615ff1be73c1b0c1f420f3da9806a_41
通过 etcd 的日志可以看到,证书相关警告,很大原因是证书过期了
1 2 3 4 5 6 7 8 9 10
┌──[root@vms102.liruilongs.github.io]-[~] └─$docker logs 43dccee957e0 | tail -5 ................... {"level":"warn","ts":"2024-02-09T05:59:17.452Z","caller":"embed/config_logging.go:169","msg":"rejected connection","remote-addr":"192.168.26.101:51158","server-name":"","error":"remote error: tls: bad certificate"} {"level":"warn","ts":"2024-02-09T05:59:17.452Z","caller":"embed/config_logging.go:169","msg":"rejected connection","remote-addr":"192.168.26.101:51148","server-name":"","error":"remote error: tls: bad certificate"} {"level":"warn","ts":"2024-02-09T05:59:17.553Z","caller":"embed/config_logging.go:169","msg":"rejected connection","remote-addr":"192.168.26.101:51166","server-name":"","error":"remote error: tls: bad certificate"} {"level":"warn","ts":"2024-02-09T05:59:17.553Z","caller":"embed/config_logging.go:169","msg":"rejected connection","remote-addr":"192.168.26.101:51164","server-name":"","error":"remote error: tls: bad certificate"} {"level":"warn","ts":"2024-02-09T05:59:17.588Z","caller":"etcdhttp/metrics.go:173","msg":"serving /health false; no leader"} {"level":"warn","ts":"2024-02-09T05:59:17.588Z","caller":"etcdhttp/metrics.go:86","msg":"/health error","output":"{\"health\":\"false\",\"reason\":\"RAFT NO LEADER\"}","status-code":503}
检查证书,发现确实过期了,1 月 26 到期,现在 2 月 8 号
1 2 3 4
┌──[root@vms102.liruilongs.github.io]-[~] └─$openssl x509 -in /etc/kubernetes/pki/apiserver.crt -noout -text | grep Not Not Before: Jan 26 11:27:49 2023 GMT Not After : Jan 26 11:30:26 2024 GMT
┌──[root@vms102.liruilongs.github.io]-[~] └─$kubeadm certs check-expiration [check-expiration] Reading configuration from the cluster... [check-expiration] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml' [check-expiration] Error reading configuration from the Cluster. Falling back to default configuration
CERTIFICATE EXPIRES RESIDUAL TIME CERTIFICATE AUTHORITY EXTERNALLY MANAGED admin.conf Jan 26, 2024 11:30 UTC <invalid> ca no apiserver Jan 26, 2024 11:30 UTC <invalid> ca no apiserver-etcd-client Jan 26, 2024 11:30 UTC <invalid> etcd-ca no apiserver-kubelet-client Jan 26, 2024 11:30 UTC <invalid> ca no controller-manager.conf Jan 26, 2024 11:30 UTC <invalid> ca no etcd-healthcheck-client Jan 26, 2024 11:30 UTC <invalid> etcd-ca no etcd-peer Jan 26, 2024 11:30 UTC <invalid> etcd-ca no etcd-server Jan 26, 2024 11:30 UTC <invalid> etcd-ca no front-proxy-client Jan 26, 2024 11:30 UTC <invalid> front-proxy-ca no scheduler.conf Jan 26, 2024 11:30 UTC <invalid> ca no
CERTIFICATE AUTHORITY EXPIRES RESIDUAL TIME EXTERNALLY MANAGED ca Jan 23, 2033 11:27 UTC 8y no etcd-ca Jan 23, 2033 11:27 UTC 8y no front-proxy-ca Jan 23, 2033 11:27 UTC 8y no ┌──[root@vms102.liruilongs.github.io]-[~] └─$
┌──[root@vms102.liruilongs.github.io]-[~] └─$kubeadm certs renew all [renew] Reading configuration from the cluster... [renew] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml' [renew] Error reading configuration from the Cluster. Falling back to default configuration
certificate embedded in the kubeconfig file for the admin to use and for kubeadm itself renewed certificate for serving the Kubernetes API renewed certificate the apiserver uses to access etcd renewed certificate for the API server to connect to kubelet renewed certificate embedded in the kubeconfig file for the controller manager to use renewed certificate for liveness probes to healthcheck etcd renewed certificate for etcd nodes to communicate with each other renewed certificate for serving etcd renewed certificate for the front proxy client renewed certificate embedded in the kubeconfig file for the scheduler manager to use renewed
Done renewing certificates. You must restart the kube-apiserver, kube-controller-manager, kube-scheduler and etcd, so that they can use the new certificates.
┌──[root@vms102.liruilongs.github.io]-[~] └─$kubeadm certs check-expiration [check-expiration] Reading configuration from the cluster... [check-expiration] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml' [check-expiration] Error reading configuration from the Cluster. Falling back to default configuration
CERTIFICATE EXPIRES RESIDUAL TIME CERTIFICATE AUTHORITY EXTERNALLY MANAGED admin.conf Feb 08, 2025 06:18 UTC 364d ca no apiserver Feb 08, 2025 06:18 UTC 364d ca no apiserver-etcd-client Feb 08, 2025 06:18 UTC 364d etcd-ca no apiserver-kubelet-client Feb 08, 2025 06:18 UTC 364d ca no controller-manager.conf Feb 08, 2025 06:18 UTC 364d ca no etcd-healthcheck-client Feb 08, 2025 06:18 UTC 364d etcd-ca no etcd-peer Feb 08, 2025 06:18 UTC 364d etcd-ca no etcd-server Feb 08, 2025 06:18 UTC 364d etcd-ca no front-proxy-client Feb 08, 2025 06:18 UTC 364d front-proxy-ca no scheduler.conf Feb 08, 2025 06:18 UTC 364d ca no
CERTIFICATE AUTHORITY EXPIRES RESIDUAL TIME EXTERNALLY MANAGED ca Jan 23, 2033 11:27 UTC 8y no etcd-ca Jan 23, 2033 11:27 UTC 8y no front-proxy-ca Jan 23, 2033 11:27 UTC 8y no ┌──[root@vms102.liruilongs.github.io]-[~] └─$
┌──[root@vms100.liruilongs.github.io]-[~/ansible] └─$ansible k8s_master -m shell -a "kubeadm certs renew all" -i host.yaml --limit'!192.168.26.102' 192.168.26.101 | CHANGED | rc=0 >> [renew] Reading configuration from the cluster... [renew] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml' [renew] Error reading configuration from the Cluster. Falling back to default configuration
certificate embedded in the kubeconfig file for the admin to use and for kubeadm itself renewed certificate for serving the Kubernetes API renewed certificate the apiserver uses to access etcd renewed certificate for the API server to connect to kubelet renewed certificate embedded in the kubeconfig file for the controller manager to use renewed certificate for liveness probes to healthcheck etcd renewed certificate for etcd nodes to communicate with each other renewed certificate for serving etcd renewed certificate for the front proxy client renewed certificate embedded in the kubeconfig file for the scheduler manager to use renewed
Done renewing certificates. You must restart the kube-apiserver, kube-controller-manager, kube-scheduler and etcd, so that they can use the new certificates. 192.168.26.100 | CHANGED | rc=0 >> [renew] Reading configuration from the cluster... [renew] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml' [renew] Error reading configuration from the Cluster. Falling back to default configuration
certificate embedded in the kubeconfig file for the admin to use and for kubeadm itself renewed certificate for serving the Kubernetes API renewed certificate the apiserver uses to access etcd renewed certificate for the API server to connect to kubelet renewed certificate embedded in the kubeconfig file for the controller manager to use renewed certificate for liveness probes to healthcheck etcd renewed certificate for etcd nodes to communicate with each other renewed certificate for serving etcd renewed certificate for the front proxy client renewed certificate embedded in the kubeconfig file for the scheduler manager to use renewed
Done renewing certificates. You must restart the kube-apiserver, kube-controller-manager, kube-scheduler and etcd, so that they can use the new certificates. ┌──[root@vms100.liruilongs.github.io]-[~/ansible] └─$